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^ ■ Abstract 

Properly designed precoders can significantly improve the spectral efficiency of multiple-input multiple- 
output (MIMO) relay systems. In this paper, we investigate joint source and relay precoding design based 
on the mean-square-error (MSE) criterion in MIMO two-way relay systems, where two multi-antenna 
source nodes exchange information via a multi-antenna amplify-and-forward relay node. This problem 
is non-convex and its optimal solution remains unsolved. Aiming to find an efficient way to solve the 
I problem, we first decouple the primal problem into three tractable sub-problems, and then propose an 

On . iterative precoding design algorithm based on alternating optimization. The solution to each sub-problem 

O ■ 

' is optimal and unique, thus the convergence of the iterative algorithm is guaranteed. Secondly, we propose 

' a structured precoding design to lower the computational complexity. The proposed precoding structure 

is able to parallelize the channels in the multiple access (MAC) phase and broadcast (BC) phase. It thus 
reduces the precoding design to a simple power allocation problem. Lastly, for the special case where only 
. . ■ a single data stream is transmitted from each source node, we present a source-antenna-selection (SAS) 

X ■ 

' based precoding design algorithm. This algorithm selects only one antenna for transmission from each 



C/3 



source and thus requires lower signalling overhead. Comprehensive simulation is conducted to evaluate 
the effectiveness of all the proposed precoding designs. 
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Index Terms 

Multiple-input multiple-output (MIMO), preceding, two-way relaying, non-regenerative relay, mini- 
mum mean-square-error (MMSE). 

I. Introduction 

Relay-assisted cooperative transmission can offer significant benefits including throughput enhance- 
ment, coverage extension and power reduction in wireless communications. It is therefore considered as 
a promising technique for the next generation wireless communication systems, such as LTE-Advanced 
and WiMAX. Depending on whether the relay can receive and forward signals at the same time and 
frequency, there are two relay modes: full-duplex mode and half-duplex mode. Although the half-duplex 
relay is more favorable for practical implementation, it is less spectrally efficient than full-duplex ones. 
For instance, it will take four time slots for two source nodes to exchange information with the help 
of a half-duplex relay when there is no direct link. To overcome the spectral efficiency loss caused by 
the half-duplex constraint, two-way relaying has been recently proposed d-S. The notion of two-way 
relaying is to apply the principle of network coding at the relay so as to mix the signals received from two 
links for subsequent forwarding, and then apply at each destination the self-interference cancelation to 
extract the desired information. In contrast to the conventional one-way relaying, two-way relaying only 
needs two time slots to complete one round of information exchange. Two-way relay strategies can be 
broadly divided into two categories, decode-and-forward (DF) and amplify-and-forward (AF), similar to 
those in one-way relaying. In DF-based two-way relaying, the relay decodes each individual received bit 
sequence, combines them together using XOR or superposition coding for example and then broadcasts 
to the two destinations. Decoding directly the combined bits may further improve the performance. In 
AF-based two-way relaying, the relay simply amplifies the received superimposed signals and forwards 
to the destinations. Compared with the DF relay strategy, the AF relay strategy is more attractive for its 
simplicity of implementation. 

The multiple-input multiple-output (MIMO) technique is a significant technical breakthrough in wireless 
communications. By employing multiple antennas at the transmitter or the receiver, one can significantly 
improve the transmission reliability by leveraging spatial diversity. If multiple antennas are applied at 
both the transmitter and receiver sides, the channel capacity can be enhanced linearly with the minimum 
number of transmit and receive antennas. Among various MIMO techniques, transmit precoding is able to 
exploit the spatial multiplexing gain efficiently in both single-user and multi-user communication systems 
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by making use of channel state information (CSI) at the transmitter. Incorporating the MIMO technique 
into two-way relaying is expected to further increase the system throughput. To fully realize the benefits 
of MIMO and two-way relaying, efficient transmit precoding by taking relay nodes into account is crucial. 
In this paper, we consider joint design of source and relay precoding in the MIMO two-way relay system 
where each node is equipped with multiple antennas. 

Recently, a few studies have focused on MIMO two-way relaying. The first category is based on the 
DF relay strategy. For example, in |5|, the authors investigate and compare the capacity gain for two 
different re-encodrng operations. In |6], the boundary of capacity region of Gaussian MIMO two-way 
relay broadcast channels is derived. Furthermore, the authors in Q, HI extend the DF-based MIMO 
two-way relay protocol to multi-user and cellular networks. From the aforementioned works, it is easy 
to find that the precoding design for MIMO two-way relaying under the DF relay strategy does not 
differ much from the conventional multi-user MIMO precoding and hence many existing techniques 
can be applied. The second category is based on the AF relay strategy. The authors in 191 develop an 
algorithm to compute the globally optimal relay beamforming matrix for a system where only the relay 
node is equipped with multiple antennas and characterize the system capacity region. In |10|, the optimal 
relay beamforming matrix is designed to minimize the total mean-square-error (MSE) of two sources. 
Under the same design criterion, the authors in ifTTIl consider the scenario with multiple multi-antenna 
relay nodes. Different from ijOl- lfTTI . the works |[T2l - |[T4l consider a system where the two source nodes 
are also equipped with multiple antennas. In lITll . applying the gradient descent algorithm, an iterative 
scheme is introduced to find the suboptimal relay precoder for sum-rate maximization. In |fT3l , the 
authors consider joint source and relay precoding design to maximize the sum-rate. In |[T4l . the authors 
propose a relay transceive precoding scheme by using zero-forcing (ZF) and minimum mean-square-error 
(MMSE) criteria with certain antenna configurations. The precoding of MIMO two-way relaying with 
AF strategy has also been extended to multi-user networks. For example, the authors investigate the 
optimal relay precoding design for a MIMO two-way relay system with multiple pairs of users in lilSl 
and further study the user scheduling problem in fT6l . In ifTTl . the authors design a new network-coded 
transmission protocol for the same model as |[T5l by combining ZF beamforming and signal alignment 
such that the intra-pair interference and inter-pair interference can be completely canceled. Other than 
using multiple antennas on each node, another way to achieve spatial diversity for AF relay strategy is to 
employ network beamforming among multiple single-antenna relay nodes as in |[T8l - |[23l . Nevertheless, 
the precoding design for AF MIMO two-way relaying is much more challenging than that for the DF 
case. 
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In this study, we focus on the joint precoding design at both the source and relay nodes for MIMO 
two-way relaying with AF strategy. Our goal is to minimize the total mean-square-error (Total-MSE) of 
two users by assuming linear processing at both the transmitters and receivers. Different from ifTOl . ifTTI . 
we consider a two-way relay system where both the source and relay nodes are equipped with multiple 
antennas. Furthermore, we study the joint source and relay precoding design rather than relay precoding 
design only. The main contributions of this work are as follows: 

• Iterative precoding design: The joint optimization of source and relay precoding for Total-MSE 
minimization is shown to be non-convex and the optimal solution is not easily tractable. We propose 
an iterative algorithm to decouple the joint design problem into three sub-problems and solve each 
of them in an alternating manner. In particular, we derive the optimal relay precoder in closed-form 
when source precoders and decoders are fixed. Since each sub-problem can be solved optimally, the 
convergence of the iterative algorithm is guaranteed. 

• Channel-parallelization based precoding design: We further propose a heuristic channel paralleliza- 
tion (CP) based precoding design algorithm for certain antenna configurations. This method applies 
two joint matrix decomposition techniques so as to parallelize the channels in the multiple access 
(MAC) phase and broadcast (BC) phase, respectively, of two-way relay systems. Certain structures 
are hence imposed on the source and relay precoders. Based on the proposed structure, the joint 
precoding design is reduced to a simple joint source and relay power allocation problem. 

• Source-antenna-selection based precoding design for single-data-stream transimssion: For the special 
case where only a single data stream is transmitted from each source, we introduce a source-antenna- 
selection (SAS) based precoding design algorithm. We find that the SAS based precoding design 
can even outperform the iterative precoding design in certain scenarios and yet has lower signalling 
overhead. 

The rest of paper is organized as follows. In Section II, the MIMO two-way relaying model is 
introduced. The iterative precoding design algorithm is presented in Section III. Section IV describes 
the channel parallelization method and corresponding power allocation algorithm. The source-antenna- 
selection based precoding algorithm for single data stream is included in Section V. Extensive simulation 
results are illustrated in Section VI. Finally, Section VII offers some concluding remarks. 

Notations: Scalar is denoted by lower-case letter, bold-face lower-case letter is used for vector, and 
bold-face upper-case letter is for matrix. E[-] denotes expectation over the random variables within the 
bracket, (g) denotes the Kronecker operator. vec{-) and mat{-) signify the matrix vectorization operator 



5 



and the corresponding inverse operation, respectively. Tr(A), A^^ and Rank(A) stand for the trace, the 
inverse and the rank of matrix A, respectively, and Diag(a) denotes a diagonal matrix with a being its 
diagonal entries. Superscripts (•)^, (•)* and (•)^ denote transpose, conjugate and conjugate transpose, 
respectively. OArxJ\/ implies the x M zero matrix and I^r denotes the N x N identity matrix. ||x||2 
denotes the squared Euclidean norm of a complex vector x. \z\ implies the norm of the complex number 
z, R{z) and '^{z) denote its real and image part, respectively. C^^^ denotes the space of x x y matrices 
with complex entries. The distribution of a circular symmetric complex Gaussian vector with mean vector 
X and covariance matrix is denoted by CA/'(x, S). 

II. System Model 

Consider an (A'^, M, N) MIMO two-way relay system where two source nodes, denoted as 5*1 and 5*2 
and each equipped with N antennas, want to exchange messages through a relay node, denoted as R 
and equipped with M antennas. The information exchange takes two time slots as shown in Fig. [T] In 
the first time slot (also referred to as the MAC phase), the two source nodes Si and 52 simultaneously 
transmit the signals to the relay node R. After receiving the superimposed signal, the relay performs 
a linear processing by multiplying it with a precoding matrix and then forwards it in the second time 
slot (also referred to as the BC phase). Without loss of generality, we assume that N data streams are 
transmitted from each source in order to fully utilize the multiplexing gain. The special case with single 
data stream transmission shall be investigated in Section V. 

Let Xj G c^xi denote the transmit signal vector from source Si, for i = 1,2. It can be expressed as 

x^ — A^s^, i — 1; 2 

where Sj E £_Nxi j-epj-gsents the information signal vector with normalized power, i.e., <S(sjS^) = 1^, 
and A,; G ^^x^ denotes the transmit precoding matrix. Each column of A,; can be interpreted as the 
beamforming vector corresponding to the respective data stream in Sj. The maximum transmission power 
at Si is assumed to be Tj, and thus we have 

Tr(A,Af) <Ti, i = l,2. (1) 

Let denote the received M x 1 signal vector at the relay node during the MAC phase. It can be 
expressed as 

Yr = Hixi + H2X2 + n.^, 

where Hj G d^A/xTV jj^g full-rank MIMO channel matrix from Si to R, and n,. denotes the additive 
noise vector at the relay node, following the distribution ~ CA/'(0, ct^Ia/ ). 
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Upon receiving y^, the relay amplifies it by multiplying it with a preceding matrix A,. G C'^^^*^. 
Therefore, the M x 1 transmit signal vector from the relay node can be expressed as 

The maximum transmission power at the relay node is assumed to be Tr, which yields 

Tr I A, (j2 HiAiAf Hf + a%?j Af | < r,. (2) 
Then the received signal at Si during the BC phase can be written as 

fi = GjXr + rij = GiA^HjAjSi + GiA^H-A-s- + Gj A^n^ + n^, i = l,2 (3) 



where i = 2 if i = 1 and i = lifi = 2, GjG £^NxM full-rank channel matrix from i? to Si, n 



denotes the additive noise vector at Si with rij ~ CJ\f{0,afl]\f)- Subtracting the back propagated self- 
interference term GjArHjAjSj from Q yields the equivalent received signal vector at each destination 
node as 

Yi = FiSj + GiA^n,. + n^, i = 1,2 (4) 

where Fj = GjA^H^A^ is the equivalent end-to-end MIMO channel matrix for Si. 

The problem in this study is joint design of the preceding matrices {Ai, A2, A^} given the global CSI 
{Hi,H2, Gi, G2} based on the MSE criterion. Specifically, the objective is to minimize the Total-MSE 
of all the data streams of two users. The Total-MSE has been widely chosen as a criterion for preceding 
design in the literature, e.g., llT4l - |[T6l . |[24l - ll27l . Although it may not be the best criterion from the 
overall performance aspect |[28l . the advantage of using Total-MSE is that one can obtain the optimal 
precoder structure or even the closed-form solution for the precoders in some cases (see |[24l . |[25l ). 
For the considered MIMO two-way relay system, we show that the closed-form relay precoder can be 
obtained under the Total-MSE criterion for given source precoders and decoders. 

Before leaving this section, we provide some discussions on the signalling overhead for obtaining the 
CSI and the preceding information in the system. First of all, we assume that the channel characteristics 
of each link change slowly enough so that they can be perfectly estimated by using pilot symbols or 
training sequences. If the channel reciprocity holds during the MAC phase and BC phase (e.g., they are 
in time-division duplex mode) with Gi = and G2 = H^, then the relay only needs to estimate the 
channel parameters during the MAC phase and the global CSI can be obtained. As a result, the joint 
preceding design can be conducted at the relay node and then the relay node broadcasts Aj to Si, i = 1,2. 
To cancel self-interference and demodulate the received signals, the source nodes should estimate the 
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corresponding channel parameters. For example, Si needs to estimate GiAj,Hi to subtract the self- 
interference si and estimate GiAr-H2 to demodulate S2. If, on the other hand, the channel reciprocity 
does not hold during the MAC phase and BC phase (e.g., they are in frequency-division duplex mode), 
more feedback channels and signalling overheads are required. The relay can only estimate Hi and H2 
during the MAC phase. To obtain the global CSI, the relay node needs Si and ^2 to feedback Gi and 
G2, respectively. 

III. Iterative Precoding Design 

In this section, we first formulate the joint optimization of the source and relay precoding for Total-MSE 
minimization in the considered MIMO two-way relay systems. This problem is shown to be non-linear 
and non-convex and the optimal solution is not easily tractable. To approach the global optimal solution, 
we propose an iterative algorithm based on alternating optimization that updates one precoder at a time 
while fixing the others. 

According to the received signal in (|4]| and assuming linear receiver, the MSB at Si can be written as 

Ji = £ {\\W,yi-s.\\l} , i = 1,2 (5) 

where Wj G £NxN Unear decoding matrix at the destination Si. Substituting (|4]l into ©, it further 
yields 

Ji=£{\\Wi {FiSj + Gj A^n^ + n^) - sj\\l} 

=TV {WiFiFf Wf - WiFi - Ff Wf + a^W^G^A^Af Gf Wf (6) 

+a2w,Wf + I,v}, i = l,2 
where we have used the fact that Sj, nj and are mutually independent. The problem is to find the 
optimal precoding/decoding matrices {A^, Aj,Wj,i = 1,2} such that the Total-MSE of the two users 
can be minimized. This is formulated as 

min Ji + J2 (7) 

A,,Ai,Wi,i=l,2 

S.t. O © 

Before solving (|7]), we present the following theorem. Based on this theorem, we only consider the case 
M > N throughout this paper. 

Theorem 1: When M > N, the Total-MSE Ji + J2 can be made arbitrarily small by increasing the 
power at both the source nodes and the relay node in the considered {N, M, A^) two-way relay system. 
Otherwise if M < N, Ji + J2 is always lower bounded by 2{N — M). 
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Proof: We first provide an alternative expression for the MSE of each source, Jj. Since the constraints 
do not involve the decoding matrix Wj in the problem formulation (O, a necessary condition for the 
optimal solution is = 0. By using the matrix differentiation rules in |[29l . the optimal solution of 
Wj, denoted as W^^*, can be expressed in closed-form as 

wf* = FfR;;;i, z = i,2 (8) 



where 



R^. = F,Ff + a^G.A^Af Gf + a^I^. (9) 



By substituting W°^* in ^ into Q, the MSE at Si, Ji, transforms into 

Ji = Tr I [ijv + Ff (a^I^v + a^G.A.Af Cf F^] , * = 1, 2. (10) 

Therefore, the minimum Total-MSE Ji + J2 of the original problem ^ will be the same as the minimum 
of Ji + J2 subject to the same power constraints. For brevity of illustration, we take Ji as an example. 
Define Q = (fifliv + (T^GiA^A^Gf^) ^ for simplicity of notation. Note that the rank of Q is equal 
to N. When M > N, it is always possible to find precoders {A^, A2} to make the rank of the term 
F^^QFi equal to N. Let a„, n = 1, 2, • • • , A^, denote the positive eigenvalues of F|^QFi, then Ji can 
be rewritten as 

Ji = Tr|[I^ + Diag([ai,a2,...,a;v])]"'| = ^ • (H) 

n=l 

Next, we prove that by increasing the power at both S2 and R, we can always increase and hence 
decrease the MSE Ji. Let us define 

E = l7v + FfQFi 

= Itv + ^s^rAf H|^Af Gf [ajlN + ^r^r^GiA^Af Gf GiA^HaAs, 
where we have replaced Fi by GiArH2A2 as defined in ^ when obtaining the second equation and 
set A2 = -v/^-A-2 and A^ = ^/O^Ar with 62 and Or being power scalar parameters for A2 and A^, 
respectively. Then, we can rewrite the MSE in (ITOl ) as Ji = Tr{E^^}. It is easy to verify that enlarging 
62 can always increase the eigenvalues Oj to decrease Ji. However, due to the power constraint at the 
relay, we also need to check how 9j. affects Ji. By defining /3 = 1/0, , we rewrite E as 

E = Itv + Af A^Gf (/3a?Iiv + a^GiA^Af Gf GiA,.H2A2. 
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Then, we have 

"^^""i^ = TY{-E-i(i[A^H^Af Gf (a.'GiA.Af Gf + /3a?lAr)-iGiA,H2A2]E-i} 

dp ^ ^ 

p 

= Tr {fT^E-^Al^Hf Af Gfp-^GiA^HsAaE-i} > 0, 

where we have used the fact that both E and P are positive definite. Therefore, we conclude that Ji is 
a monotonically decreasing function with respect to Or- It suggests that enlarging 9r can also increase a, 
and decrease Ji. 

Secondly, we show that if M > N, the MSE Ji can be made arbitrarily small by increasing the power 
at both source and relay nodes. To this end, we simply assume that when increasing the power at 5*2 
(i.e., increasing the scalar 62), the relay just increases its power to keep 9r unchanged. Thus, similar to 
([TT]) . we have 

^ 1 

where a„, n = 1, 2, • • • , A^, are the eigenvalues of Ag^Hg^A^Gf^QGi A,.H2A2. For an arbitrarily small 
ei, by defining dmin = min{ai, 02, • • • ,dj\i}, we can always have 

N 



N 

^Tzr^ — 

i + U2amin 

if 0^ > 

On the other hand, if M < N, the maximum rank of the term Ff^QFi in Ji is M. Assuming that the 
M non-zero eigenvalues of Fj^QFi are denoted by {61, 62, • ' ' > ^m}, the resultant Ji can be expressed 
as 

M 
n=l 

No matter how much power is provided at the source and relay nodes, Ji is always lower bounded by 
N — M. The same bound holds for J2. Theorem 1 is thus proven. ■ 
We now take a closer look at the problem ([T]), which can be proven to be non-linear and non-convex 
and hence is difficult to solve. To make the problem tractable, we propose an iterative algorithm which 
decouple the primal problem into three sub-problems and solve each of them in an alternating optimization 
approach. 

First, given the precoding matrices at the source and relay nodes, i.e., Ai, A2 and A,., we try to find 
the optimal decoder matrices Wi and W2. Since the power constraints in ([T|l and Q are not related to 
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Wi and W2, we simply get an unconstrained optimization problem 

mill + (14) 

Wi,W2 

where J^^ = Tr { W^R^, Wf - W^Fi - Ff Wf + I^v} , i = 1, 2, with R^^ given in ©. Since R^^, 
z = 1, 2, is positive definite, the objective function in (fT4b is convex with respect to Wj. Therefore, ap- 
plying the Karush-Kuhn-Tucker (KKT) conditions, we obtain the optimal decoding matrices as described 
in ([8]l by equating the gradient of objective function in (fT4l) to zero. 

Second, we consider the optimization of the relay precoding matrix A,, by assuming that Wj, Aj, 
z = 1, 2, are fixed. From this sub-problem is equivalent to 

mill Jj.^ + (15) 
S.t. Tr {A^R^Af } < (16) 

where is obtained by replacing Fj in Q with GjAj,HjAj as defined in @ and using the circular 
property of trace operator Tr{AB} = Tr{BA}, given by 

Jr =Tr {Gf wf WiGiA,.R^,Af - HjAjWiGiA^ 

(17) 

-Gf Wf Af Hf Af + afWiWf + I,v} , i = 1, 2 
with ILx- = HjAjAf Hf + ct^Ia/, and ( [T6l ) refers to the relay power constraint defined in Q with 

R^ = HiAiAf Hf + HaAsAf Hf + ajlM- 

Note the source power constraints ([T]) are irrelevant here since Ai and A2 are fixed. 

Lemma 1: The problem of relay precoding design given source precoders and decoders for Total-MSE 
minimization in the considered {N, M, N) MIMO two-way relay system as formulated in ( fTSl ) is convex. 
Proof: Please refer to Appendix |Al ■ 

Due to the convexity of the problem (flSl ). we can readily design the optimal relay precoder by 
employing the KKT conditions. Specifically, the Lagrangian function of ( fTSl ) is given as 

C = Jr, + Jr^ + A (Tr {A,R^Af } - r,) , 

where A > is the Lagrangian multiplier. Thus, the KKT conditions are 

dC 



OA* Rri A^R^ + Rr2 A^Rxi — Rr + AAfRx- — 0, (18) 
oA* 

A(Tr{A,R,Af} -r,) =0, (19) 

TrjA^R^Af} <r„ (20) 
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where = GfWf A^Hf + GfWf AfHf and R^, = GfWfWiGj, i = 1,2. To obtain 
the differentiation rule '^'^'^^^gz^ = AiZAq in 1291 is applied. 
Based on (fTSl) we further obtain 



A7* = mat|[R^^ (g)Rr^ +R^^ (g)Rr, + AR^(g)lM] ^wec(Rr)|. (21) 

In the special case when A = 0, we have 

A7* = mat I [R^^ Rj^^ + R^^ (g) Kr,] vec(Rr)] ■ (22) 

If A°^* in (l22l ) meets the condition (l20l ). then (l22l ) is the optimal relay precoder. Otherwise, A in (|2T| | 
should be chosen to satisfy Tr | A^R^A^} = r^. 

Lemma 2: The function (7(A) = Tr {A^R^^A^}, with A^ given by (|2T]) . is monotonically decreasing 
with respect to A and the optimal A is upper-bounded by y —^—^ — -. 

Proof: Please refer to Appendix |B] ■ 

With Lemma 2, the optimal A meeting the condition Tr { Ar-R^jA,^} = Tr can be readily obtained 
using bisection search. 

The third sub-problem is to optimize the source precoder Aj for fixed A^ and W,, i = 1,2. This is 
formulated as: 

min Js^ + (23) 

Ai,A2 

s.t. TrjAiAf } <Ti, i = l,2 

TV {Rp, AiAf + Rp^AaA^} < t^. (24) 
where r,', = - ci^Tr { A,.Af }, Rp^ = Hf A^A,,Hi, i = l,2 and 

Js, = Tr {R,,, A^Af - 23f? (R.., A^) + R,,3} ,i = l,2 (25) 

with 

R.,., = H-^Af Gf WiGiA^Hj, 



R,.3 = (j^WiGi A,Af Gf Wf + cjfWiWf + I^. 

To obtain (l25l) . the circular property of trace operator is again appUed for Q. 

It is noted that the change of source precoders can affect the power constraint at the relay. Hence, the 
relay power constraint should be included as (l24l) in ( [23l ). By applying the conclusion derived in Lemma 
A (given in Appendix [A]), we can also prove that the optimization problem ( [231 ) is convex. 
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Lemma 3: The optimization problem in the form of (1231 ) can be transformed into a convex quadratically 
constrained quadratic program (QCQP) problem. 

Proof: Please refer to Appendix |C] ■ 
A QCQP problem can be efficiently solved by applying the available software package ||30|. 
In summary, we outline the iterative precoding design algorithm as follows: 

Algorithm 1 (Iterative precoding) 

• Initialize Ai, A2 and aJ3 

• Repeat 

- Update the decoder matrices Wi and W2 using {Sj for fixed Ai, A2 and A^; 

- Update the relay precoder matrices Ar using J21b or ( I22t for fixed Ai, A2, Wi and W2; 

- For fixed Ar, Wi and W2, solve the convex QCQP problem to get the optimal Ai and A2 as in Appendix [Cl 

• Until termination criterion is satisfied. 

Theorem 2: The proposed iterative precoding design algorithm, Algorithm 1, is convergent and the 
limit point of the iteration is a stationary point of 

Proof: Since in the proposed algorithm, the solution for each subproblem is optimal, the Total-MSE 
is decreased with each iteration. Meanwhile, the Total-MSE is lower bounded (at least by zero). Hence, 
the proposed algorithm is convergent. It further means that there must exist a limit point, denoted as 
|Wj, Aj,i = 1,2, A^}, after the convergence. At the limit point, the solutions will not change if we 
continue the iteration. Otherwise, the Total-MSE can be further decreased and it contradicts the assumption 
of convergence. Since Wj, Aj (i = 1, 2) and A^ are local minimizers for each subproblem, we have 

Tr{Vw.J«,(Wi; Ai, A„i = l,2f (W^ - W^)} > 0, 

Tr{VA.Jr(A,;Ai,Wi,i = l,2f (A, -A,)} >0, 

Tr{VA.Js(Ai; W„ A„i = 1, 2)^(Ai - A^} > 0, 

where = J^i + Jw2> Jr = Jri + Jr2 and Jg = Jsi + Js2- Summing up all the above equations, we get 

Tr{VxJ(Xf (X-X)} >0, (26) 

where J = Ji + J2 and X = [Wi, W2, Ai, A2, A,]. Result (l26l ) impUes the stationarity of X of ([7]) by 
definition. ■ 



'Here, Ai and Ar can be randomly generated complex matrices or set as identity matrices, as long as they satisfy the given 
power constraints. 
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Remark 1 : In this work, the precoders are designed to minimize the Total-MSE of all the data streams 
of two users. This may lead to unbalanced MSE distribution among the data streams. In general, the 
overall error performance is dominated by the data stream with the highest MSE 1281 . Therefore, an 
alternative objective is to minimize the maximum per-stream MSE among all the data streams in order 
to improve the overall performance. Nevertheless, in 1281 . it has been proven that the min-max MSE 
problem can be solved through the Total-MSE minimization. Specifically, the solutions to the min-max 
problem can be obtained by multiplying the source precoder Aj of the Total-MSE problem with a rotation 
matrix to make MSE matrix with equal diagonal entries. 

IV. Low-Complexity Precoding Design Based on Channel Parallelization 

The iterative precoding design algorithm presented in Section III obtains good performance as verified 
in Section VI, but also has high computational complexity. In this section, we propose a new precoding 
design that offers a good balance between performance and complexity. 

It has been proven in Il24l - ll26l . ll3TI - ll33l that the optimal precoding structure in one-way relaying 
is to first parallelize the channels between the source and the relay, as well as between the relay and 
the destination using singular value decomposition (SVD) and then match the eigen-channels in the two 
hops. Taking the transmission of single data stream in a one-way relay system for example as considered 
in ||34l and |[35l . the idea of channel matching is as follows. The source should use the dominant right 
singular vector of the channel in the first hop as beamformer to transmit its signal. After receiving the 
signal from the source, the relay should first multiply it with the dominant left singular vector of the 
same channel and then transmit it through the dominant right singular vector of the channel in the second 
hop. 

Motivated by the findings in ll24l - ||26l . ll3Tl - |[35l . we aim to design Ai, A2 and A^. so as to 
simultaneously parallelize the bidirectional links in the MIMO two-way relay system. In the following, 
we introduce a heuristic channel parallelization method for bidirectional communications by using two 
joint channel decomposition methods, namely, generalized singular value decomposition (GSVD) for the 
MAC phase and SVD for the BC phase. Using this method we then reduce the precoder design to a 
simple power allocation problem. 

A. Channel Parallelization 

The major task of simultaneously parallelizing the bidirectional links is to jointly decompose the 
forward channel matrix pair {Hi, H2} in the MAC phase and the backward channel matrix pair {Gi, G2} 
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in the BC phase. To do so, we first apply the GSVD technique for the MAC channels. The GSVD is 

elaborated in the lemma below. 

Lemma 4 f36l : Assuming A G c*"^" and B G C™^", m < n < 2m are two full-rank matrices that 
A 

n, there exist two m x m unitary matrices \Ja, and an n x n non-singular 



satisfy Rank 



B 



matrix V which make 



A = Ua^aV, B = UbI^bV, 

where J^a = [Omxin-m), -^a], = [A^, 0„x(n-m)] and they satisfy S^^^ + S^Ss = I„. Here Aa 
and A^ are two m x m non-negative diagonal matrices. 



By applying Lemma 4 onto the channel pair {Hf^, H^}, Hi and H2 can be expressed a: 



Hi 



Ho 



(27) 



where V/^ is a non-singular M x M complex matrix, \Jh^ and JJh^ are two N x N unitary matrices. 



'hi 



0^ A^ 



T 



and S 



h2 



A^ 0^ 



where Ah^ and A;,^ are two N x N non- 



negative diagonal matrices. If the relay precoder A^ contains Y^^^ at the right side and Aj has U/j^ at 
the left side, we can parallelize the two forward channels in the MAC phase. 

For the BC phase, since the superimposed signal should be simultaneously transmitted to two desti- 
nations, we construct one virtual point-to-point MIMO channel as G = [G^^, G^^]^. By imposing SVD 
technique on G, we have 



A'^ 0^ 



(28) 

1 T 



where and Ug are 2N x 2N and M x M unitary matrices, respectively. Sg 
where A^ is an M x M non-negative diagonal matrix. If A^. contains Ug at its left side, the virtual 
point-to-point MIMO channel G is parallelized in the BC phase. Accordingly, we can rewrite Gi and 
G2 as 

where \g, = Vg(l : N,l : 2N) and Vg, = \g{N + 1 : 2iV, 1 : 2N). Note that Vg^ and Vg, no longer 
have the unitary property. 

We now readily propose the following structure for the three precoders: 



Ai =U;,,Aa,Va,, A2 = U;,,Aa,Va,, A, = UgA^^V 



-1 
h ' 



(29) 



^To apply Lemma 4, we here assume that M < 2N. 
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where V^i and V^i^ are arbitrary unitary matrices, A^^, A^^ and Aa^. aie N x N, N x N and M x M 
real diagonal matrices, respectively, to be optimized in the next subsection. 
The received signal in © can therefore be rewritten as 



1,2 



(30) 



V^^rir- Note that V^^ being unitary, it does not affect the statistical 



and 



where Sj = V^^Sj and 

property of nor the designed precoders. Given M > N, since "Sh^ 

r 1 ^ 

A^^,0^^ jy-j^^ as given by the GSVD, the effective channel gains for the N data streams of two 
sources can not be matched simultaneously. In other words, the gain of a certain data stream for Si may 
be very strong while the gain of the corresponding data stream for 5*2 can be very weak. To avoid such 
unbalance, if not specified otherwise, we only consider the case with M = N where all the channel 
gains can be utilized for transmission of both users in the following of this section. Then, ( [30b turns to 



Yi = \'g^AgAA,Ah-^AAjSj + Yg^AgAA,.nr + Hi 



1,2 



where Vg^ = Vg(l : iV, 1 : iV) andV^, = Vg(iV+l : 2A^, 1 : N), and AkJor k G {Ai,A2,Ar,g,hi,h2}, 
is an N X N non-negative diagonal matrix. 



B. Joint Power Allocation 

Based on the precoder structures proposed in ( |29l ), in this subsection we discuss the joint optimization 
of A^^, Aa2 and A^^^, to minimize the Total-MSE of the two users. By substituting ( [291 ) into ( [TOl ). we 
rewrite J,; as 

-1" 



Ji = Tr 



In + {AAjAhjAA,Ag) {afBg^ + a'^AgAA,BhAA,Ag) {AgAA,Ah,AA. 



(31) 



-1 



where Bg. = (^V^Vg^j and Bh = (V^V/^)" . It is found that, although Ji, i = 1,2 has been 
simplified, the MSE covariance matrices are still non-diagonal. Solving the optimization problem directly 
becomes difficult. However, we can resort to a tractable upper bound on the MSE to simplify the problem. 
Lemma 5: An upper bound of Jj defined in ([3T]) is given by 



<Tr 



In + (AAjAhjAA^Ag) {afAsg, + ajAgAA^Ash^AAg) {AgAA^AhjAA.] 



, (32) 



where A^g, and Abh. are two diagonal matrices that contain the diagonal entries of B^^ and B^, 
respectively. 

Proof: Please refer to Appendix |Dl ■ 
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The MSE upper bound matrix in (132] ) has a diagonal structure. Therefore, we can minimize the upper 
bound to design the precoders. By further assuming = A| for k G {Ai, A2, Ar,g,hi,h2}, the upper 
bound in Lemma 5 denoted as J," can be reformulated as 



N 



-1 



n=l 



1,2 



(33) 



where p^'s are the diagonal entries of P^. and A^'s with k G {Bh, Bgi, are the diagonal entries of 
Afc. It is interesting to find that is the Total-MSE of each sub-parallelized channel after zero forcing 
by V-^\ 

Finally, the precoder design can be simplified to the optimization problem as follows: 



min Ji" + J2" 



(34) 



TV 



N 



s.t. < < ^2, pX > 0, pX > 0, pX > 



n=l 

N 



n=l 



(pIpX +PlPX+<^r^Bk) < rr 

n=l 

Compared with the original objective function in ( [311 ). the expression in ( [34l ) exhibits a simpler form and is 
more analytically tractable. Nevertheless, the problem ( [34] ) is still a non-convex optimization problem. In 
the following, we apply the iterative approach to convert the problem (l34l) into two convex sub-problems. 

1) Sub-problem 1: For given p"^^ and p"^^, Vn, we formulate the following problem as follows to get 
the optimal P^^ 



min + 



(35) 



N 



S.t. 



By verifying 



T.PA. {pIpX +pIpX + ^>Bh) < rr, VX > 0, Vn 

n=l 

g2ju _ 2af\l^^ptrgPX {^'Ap^ +P>1,Pa) 



dp\^ 



> 0, i = 1,2 



/^^Bs. +pX ['^r>^mPl,+Pph,PX, 
we conclude that this sub-problem is convex. Based on the KKT conditions (details presented in Ap- 
pendix 10), we derive the water-filling solution 

pX = max [0, Root(/)] , Vn (36) 



where Root(/) denotes the maximum real root of the equation / which is given by 



2 \n n „n n 
^BgiPhjPgPA-i 



i=l 



H^Bg. +PX {^rKhP^g+Pl,Pl,PX 



2 ' 



(37) 
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and the variable /x should be chosen to satisfy 



N 



n=l 



2 ) Sub-problem 2: For given , Vn, we obtain p^\^ and p'\^ by solving the optimization problem as 
follows: 



min Ji" + 

Pai .V« 



(38) 



N 



N 



S.t. 



Y^pX < ri, Y.PX < r2, pX > 0, pX > 0, Vn 

n=l n=l 

AT 

E^'A. +PLp1 + ^'r^lk) < Tr 



n=l 



Also by verifying 



Q2JU _ 2[afXl^^+a^,Xl^ppx){plppX 



dpl'^ 



> 0, 1 = 1,2 



''l^Bgj + (^r^ihPgPX- +PtPgPXPX 

the sub-problem (|38] ) is still convex. However, a closed-form solution to this problem is generally not 
available. Some standard numerical methods, such as interior-point method, can be used to get the 
optimum solution. 

The solutions in Sub-problem 1 and Sub-problem 2 show that Pa^, and Paj are tightly coupled. 
Thus, we apply an iterative approach to find the final solution. As verified by our simulation, the algorithm 
converges in only a few iterations. After obtaining Aaj, Aa2 and Aa^ from the square root of Pai> P^a 
and Pa,,, we substitute them into (|29l ) to get the precoders. 

The overall algorithm is outlined as follows: 

Algorithm 2 (Channel parallelization based preceding) 

• Decompose the channel pairs {Hi,H2} and {Gi,G2} by using i2H and i28t . respectively, to get A^^, A^^, Aj, Bgj, 

and B^. 

• Repeat 

- Update the relay power allocation using l l36t to get Aa^ \ 

- Update the source power allocation and by solving l l38t to get Aai and AA2', 

• Until termination criterion is satisfied. 

• Substitute the solved Aai, A^a and Aa,, into l |29t to get the precoders Ai, A2 and A^. 
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V. Source-antenna-selection based Precoding for Single Data Stream 

In this section, we consider the precoding design for the special case where only a single data stream 
is transmitted from each source. The iterative approach proposed in Section III can be applied directly, 
except that the source precoding matrices reduce to beamforming vectors. In what follows, we introduce 
a new precoding strategy based on antenna selection at two sources. Antenna selection can be viewed as a 
special case of beamforming. In general, it is computationally less complex and requires lower feedback 
overhead. This motivates us to consider the source antenna selection while using precoding at the relay 
node only. 

For single-data-stream transmission, the received signals given in (HJl at each destination node is 
simplified as 

Yi = Arh-„Si + GjA,.nr + n^, i = 1,2 

where is the selected forward channel vectors for Sj in the MAC phase. After decoding by Wi, the 
corresponding MSB at Si is denoted as 

J, = wf G,A,R,. Af Gf - V^wf Gi A,hi„ - ^^h^Af Gf + afwf w, + 1, i = 1, 2 

where = nhinh^ + ct^Im- Thus, for a given selected antenna pair {hin,h2m}, the optimization 
problem is formulated as 

min Ji + J2 

Ar,Wl,W2 

S.t. Tr { A^ (nhi^hg, + r2h2mh^rn + f^rlM) } < Tr 

Next, we take two steps to solve wi, W2 and A^, respectively. First, for fixed A^, the optimal Wj is 
denoted as 

wf = [Gi A,R,, Af Gf + afUi] Q A,hj„, i = 1,2. (39) 
Subsequently, for fixed wi and W2, we obtain the optimal A^ as 

A7* = mat {-Rl^ ® (Gf wiwf Gi) + R^^ (G^W2wf G2) + /uR^ Im}'^ vec{M} , (40) 

where R^ = rihi„hf^+r2h2rnh|^+a2lM, M = ^G^W2hf^+/?iGf wihg„ and n £ [0, 
is chosen to satisfy the KKT conditions. The derivation is similar to the steps derived in Section III, and 
hence omitted for brevity. In summary, we outline the algorithm as follows: 

Algorithm 3 (Source antenna selection (SAS)-based precoding) 
• For each source antenna pair {hi„, h2m}, Vn, m 



/Tr{MR-'M«} 
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- Initialize Ar randomly or as y^^^^lM with Kx = rihi„hf^ + r2h2mh|J„ + (t^Im 

- Repeat 

* Update the decoding vector by l |39t for fixed A,-; 

* Update the relay precoder by l l40t for fixed wi and W2; 

- Until termination criterion is satisfied. 

• End choose the source antenna pair and the corresponding wi, W2 and that lead to the minimal Total-MSE Ji + J2. 

Remark 2: Compared with the three-step iterative precoding algorithm, Algorithm 1, the SAS-based 
precoding algorithm, Algorithm 3, only needs two steps in each iteration. Additionally, the closed-form 
solution can be employed in each iteration. Thus, no advanced software package is needed here. 

VI. Simulation results and Discussions 

In this section, we present some simulation examples to evaluate the proposed precoding designs. The 
channel is set to be Rayleigh fading, i.e., the elements of each channel matrix are complex Gaussian 
random variables with zero mean and unit variance. For simplicity, we consider the reciprocal channel 
where Gi = and G2 = (our algorithm is suitable for the general case where Gj are Hj are 
independent). The noise powers at two destinations are set to be equal to each other, i.e., a\ = a\ = a^. 
The average signal-to-noise ratios (SNRs) for the MAC phase and BC phase are defined as pi = p-, 
P2 = and Pr = respectively. The average bit error rate (BER) using quadrature phase-shift keying 
(QPSK) modulation is simulated. 

A. Convergence and Robustness of the Proposed Iterative Algorithm 

Fig. |2] illustrates the convergence behavior of the iterative algorithm presented in Section III as the 
function of SNR at = M = 2. It is found that, in the low SNR regime, the iterative algorithm converges 
within 10 iterations. With medium SNR, it converges after about 30 iterations. While in the high SNR 
regime, 50 iterations are always enough. 

Since the proposed iterative precoding algorithm only finds the local optimal solution due to non- 
convexity of the primal problem, different initialization points may result in different convergent solutions. 
Fig. |3] and Fig. |4] show performance comparison with different initialization points at = M = 2 and 
N = M = A, respectively. Here, "Identity" means that the algorithm is initialized by the identity matrix, 
while "Random A^" means that A^ randomly generated initialization points are tried and the one with 
the best performance is finally chosen. We observe that the BER performance gain by choosing the best 
out of different initialization points is minimal. We thus conclude that the proposed iterative precoding 
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algorithm is robust to the initiaUzation points and hence near optimal. For the rest of the simulation, the 
"Identity" initialization point is adopted unless specified otherwise. 

B. Performance Comparison for Multi-data-stream Transmission 

In Fig. [5] and Fig. [6l we show the MSB and BER performance comparison of the proposed iterative 
precoding design and the channel-parallelization based precoding design (CP-precoding) as the function 
of pi = p2 = Pr Sit N = M = 2. For comparison, the CP-precoding design with uniform power allocation 
(uniform CP-precoding), i.e., equal power distribution among all data streams, is also simulated. We find 
that with both the iterative precoding and the CP-precoding, the system BER decreases considerably when 
SNR increases. This demonstrates the effectiveness of the proposed precoding designs. We also find that 
the uniform CP-precoding only achieves marginal gain over the non-precoding case. This is due to the 
fact that uniform power allocation can lead to unfair channel gain distribution among the data streams, 
and the system BER performance is dominated by the poorest sub-channel. We thus conclude that it 
is essential to optimize the power allocation among data streams for the channel-parallelization based 
precoding design. From Fig. |5] and Fig. [6l it is observed that the iterative precoding designs exhibits the 
best performance among all the proposed precoding designs. We attribute the performance improvement 
to not enforcing any structure on the precoders. 

Fig.|7]illustrates the BER performance comparison at different relay antenna number M when the source 
antenna number is fixed at = 2. We find that increasing the relay antennas significantly enhances the 
BER performance thanks to the increased diversity gain. Moreover, the gain of the proposed precoding 
scheme over the non-precoding scheme increases dramatically as the number of relay antennas increases. 
It further implies that when the relay node has more antennas than the source nodes, conducting the 
precoding is more beneficial. 

Finally, the performance comparison between the proposed iterative joint source/relay precoding and 
the relay precoding scheme in [14| is depicted in Fig. [8] at = 2. Since the antenna configuration in 
|[T4l should satisfy the condition M > 2N, we choose M = 4 and 5 in the simulation. It is shown 
that, by applying either MMSE or ZF receiver, the proposed joint source/relay precoding significantly 
outperforms the scheme in |[T4l where precoding is applied at the relay only. This implies that in two-way 
relay systems, precoding at the source nodes is very helpful in improving the system performance. It 
is also found that both MMSE and ZF receivers obtain almost the same performance for the proposed 
precoding algorithm. 
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C. Performance Comparison for Single-data-stream Transmission 

In Fig. |9l we show the BER performance for single-data-steam transmission. Here, the proposed 
iterative precoding (proposed ite-precoding) and the source-antenna-selection based precoding (proposed 
SAS-precoding) are simulated. We find that the performance gained through precoding is more significant 
for the single-data-stream transmission than for the multi-data-stream transmission. This is because there 
is no interference from other data streams. In addition, with the "Identity" initialization point, the SAS- 
precoding almost has the same performance as "Random 5" and "Random 10" cases and it outperforms 
the ite-precoding method with both "Identity'^ and "Random 1" initialization point although it needs 
lower feedback overhead. The reason is that the optimal beamforming vector at each source cannot 
be obtained due to the non-convexity nature of the joint optimization problem, while by exhaustively 
searching the most suitable source antenna pair, the SAS-precoding design can achieve better performance. 
However, as the number of randomly generated initialization points increases, the ite-precoding design 
starts to outperform the SAS-precoding design 0, as the ite-precoding design is approaching the optimal 
solution. Moreover, it is shown that the "Random 5" ite-precoding design scheme and the "Random 10" 
ite-precoding design scheme almost obtain the same performance. However, such optimal approaching 
solution has substantially higher computational complexity and may not be practical for implementation. 

VII. Conclusions 

In this paper, we studied the joint source/realy precoding design for AF MIMO two-way relay systems 
based on the MSB criterion. An iterative method was first proposed to obtain the local optimal solutions 
for the Total-MSE minimization. Then, for the scenario in which all nodes are equipped with the 
same number of antennas, we proposed a channel-parallelization based precoding design algorithm to 
parallelize the channels in both MAC and BC phases. By doing so, the joint precoder design is reduced 
to a simple power allocation problem. It was shown that the iterative precoding design outperforms the 
channel-parallelization based precoding design since no structure constraint is enforced on the precoders. 
Although the channel-parallelization method obtains degraded performance, it on the other hand reduces 
the computational complexity. When single data stream is transmitted from each source, the precoding 

^ It implies that the "Identity" relay precoding matrix is usually a good initialization point as in the multi-data-stream case. 

''For ite-precoding method, only the relay precoder is the matrix, while two source precoders is actually vectors. Here, with 
slight confused using of the notation, "Identity" source precoder means the vector with equal entries. 

^Note here it is different from the multi-data- stream iterative precoding, we find that the "Identity" source precoding vector 
is not a good initialization point. 
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at source nodes can be replaced by the antenna selection. By this way, the system feedback overhead is 
reduced and no advanced software package is needed. Simulation results showed that all the proposed 
precoding designs are effective compared with conventional schemes. 

Appendix A 
Proof of lemma 1 

We first show that the objective function in (fTSl ) is a convex function. Since the sum of two convex 
functions is still a convex function, the convexity of J^^ + Jr^ can be verified by showing that J^^ 
and Jr^ are both convex. We take as the example to illustrate the proof and the extension to J^.^ 
is straightforward. For notation simplicity, we define Ri = Gf^Wf^WiGi, R2 = H2A2W1G1 and 
a = Tr {(j^WiWf^ + l7v}- By applying matrix manipulations in |[36l Eq.l. 10.62, Eq.l. 10.64], J^^ can 
be reformulated as 

J^^ = (R^^ Ri) + vec{YilYar + uec(Rf ) + a, 

where a^ = vec{Ar). Based on the vector differential rule in |[29l . four Hessian matrices as defined in 
|[37l are derived as 

^a,.„a.^ri = (R^, ® Rl)^, ^a.,a* = R^, Rl, ^a.,a,. = 0, ^a;,a* = 0. 

In order to show the convexity of Jr^ , the following block matrix should be positive semidefinite 



Rl 



(Rl®Ri 



'X2 



Before confirming the positive semidefinition of 'H{Jr^), we introduce the following lemma. 

Lemma A: The Kronecker product of any two positive semidefinite matrices is also positive semidef- 
inite. 

Proof: Let Zi and Z2 be any two positive semidefinite matrices. We can decompose them into 

11 i i 1 1 

Zi = Z|Z| and Z2 = Z|Z| where Zf and Z| are also both positive semidefinite matrices. Applying 
the rule AB (g) CD = (A C)(B (g D), we have 

z = Zi Z2 = (zf Zf ) ^ (z|z|) = (zi ® z|) (z'l ^ zl 

Since Zf ZJ is Hermitian, we conclude that matrix Z is positive semidefinite. ■ 
By applying Lemma A, we derive that the matrix R!^^ Ri is positive semidefinite since both R^^ 
and Rl are positive semidefinite. Then, 'H^JrJ is positive semidefinite. Hence, the convexity of J,.^ is 
proven. The same result holds for J^a- Thus we conclude that the objective function J^^ + Jr^ is convex. 
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Next, we prove that the feasible set provided by TrjArRi A^} < Tr is convex. This can be 
alternatively proven by checking the convexity of the function / = TrjArRxA^} |[38l . Similar to 
the previous manipulation, / can be reexpressed as / = a.^ {B^ ®lM)^r- In addition, the corresponding 
four Hessian matrices are derived as 

Applying Lemma A, we can also show that the block matrix 'H(J/) is positive semidefinite. Thus, we 
derive that the feasible set in ([TS] ) is convex. Since both the objective function and the feasible set are 
convex, the optimization problem ([T5] ) is a convex problem. 

Appendix B 
Proof of lemma 2 

Since there exists an inverse operator outside the Lagrangian multiplier A, it is easy to verify that g 
decreases with A. Next we mainly focus on deriving the upper bound of A. To this end, we first assume 
that Rr can be divided into two parts asRr = Qi + Q2, and let 

Qi = R., A°f*R,, + R., A7*R,, , Q2 = A°P*A°f*R,, (41) 
where A°^*, A°p* are the optimal primal and dual solutions of ([T5] ). Applying (|4TI) . we have 

A°^*=>^Q2R-^ (42) 
Substituting (|42l) into the power consti^aint (l20l) to make the equality satisfied, it has 

Tr {a°^'*R,A:!^'*^} = Tr |^Q2R-iR..R;iQf I = Tr j^QsR-^Qf | = r,. 
On the other hand, we have 

Tr { ^R.R-^Rf } = TV I ^ (Qi + Q2) R-^ (Qi + Q2)^} 

= {3;^^^^^"^^'} + {3^Q2R.-^Q?} (43) 

+ ^ {3;^^^^^"^"} + Tr {^^Q^R^-^Qf } . 

Since if Zi, Z2 are positive semidefinite, it has Tr {Z1Z2} > 0. We thus conclude that Tr {t^^|jyQiR^^Q|^} 
in (|43] ) larger than or at least equal to zero. Next we prove Trj^QiR^^Q^} > 0. Based the definition 
in (UIl), it has 

TV {QiA°p*^} = Tr {r,, A°P*R^., A7*^ + R,, A°p*R^., A^*^} > 0. (44) 
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Substituting (|42ll into (041), we obtain 

Tr{QiA-*^}=Tr{^QiR-iQf}. 

Thus, we conclude that TV {^^QiR-^Ql^} > (the same for Tr {j^Q2'R^^Cli}). Since all terms 
in ( |43] l are larger than or at lease equal to zero, we conclude 

Thus, the proof of Lemma 2 is completed. 



Appendix C 
Proof of Lemma 3 

By using the rule of the trace operator Tr{ABCD} = (vec(D)^)^ (C"^ (g) A) vec{B) in [36], Js 
can be reformulated as 



J. 



{bfaj}+Tr{R,.3}, i = l,2 



(45) 



where Pj = Iat (g) R<j.^, bj = vec{'R'^^^) and = vec{Ai). Again it is known that Pj is a positive 
semidefinite matrix from Lemma A. Thus, (1451 ) can be transformed into 

(46) 



J,. = ||p|a^||2-25R{bfa^} + Tr{R,^3}, i = l,2. 



To further delete K(-) operator, we redefine a^ = [K{a?^}, 9{af }]^, i = 1,2 and transform (l46l ) into 

Js, = af Pia- - 2bf a^ + Tr {R,,3} , i = 1, 2 
5R{pf} -9{P|} 



where P,- = Pj'P, with P, 



, hi 



^{hf},--s{hj} 



1,2. It is 



{pf} k{p|} 

easy to verify that Pj is a positive semidefinite matrix. Similarly, for three power constraints, we have 
TV{Af AJ = afQ^aj with = l27v=x27V^, i = 1,2, Tr {Rp^ Ai Af + Rp, AsA^} = afQsai + 
af Q4a2 with Q3 = Q^^Qs, Q4 = Q4 Q4 being two positive semidefinite matrices where Q3 and Q4 
are denoted as 

5ft{(Ijv®Rpj'} -^{(Ijv^Rpj'} 
^{(Ijv^Rpj'} 3f?{(I^®RpJ^} 

^{(iTV^Rpj'} -9{(l7V»RpJ^} 
9{(l7V®Rpj'} ^{(I^^RpJ^} 



Q3 



Q4 
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Finally, by combing ai and a2 as a = [aj^,a2^]^, the optimization (l23l) has the following form 

min a'^Pa - b'^a + Tr {Rs,^ + R^^a } 

a 

s.t. a^Qia < Ti, a'^Q2a < T2, a^Qsa < 



Qi 


, Q2 = 












Q2 



where P = ' , b = [2bf', 2bf J"^, Qi = , Q9 = and Q3 = 

Pi Q2 

Q3 1 

. Since P and Q,, i = 1,2, 3, are positive semidefinite, then by definition the optimization 

Q4I 

problem ( 1231 ) is transformed into the convex QCQP programming problem. 

Appendix D 
Proof of Lemma 5 

Due to the similarity between Ji and J2, we next focus on deriving the upper bound of Ji and the 
similar results will hold for J2. By defining C = crfB^^ + cj^AgA^^B/iAA^Ag, D = A^A^^A/j^ A^^^, 
the MSB in ( [3l1 ) is rewritten as 

Ji =Tr{[l7v + DC-iD]"^} 



Tr 



l-N — [i-N 



liv + D^CD 



where we have used the matrix inversion lemma (l + A 



I — (I + A) ^. Since for any positive 



definite square matrix A, it has Tr|A ^} > [A(z,i)] ^ [24i|, we thus have 

N 

Ji<N-Y, [(Ijv + D-iCD-i) {t, i)] 



i=l 



Thus, Lemma 5 is proven. 



The Lagrangian function of (1351 ) is given as 



N 



n=l 



TV [l,v - (liv + D-^AcD 
Tv{[lAr + DA^iD]~'}. 



Appendix E 
Deriving the conclusion in (|36 



+ 



(47) 



2\n „n„n 



2\n „n^n 



^i^^lg, + ^'r^BhP>X + PIp'^PaA. as,, + <^'r>^'hhP>y + PIP^XPa 



+ 



N 



PA^ [Ph^PA, +Ph^PA^ + <7ABh) - 



N 
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where fi and /3" are Lagrangian multipUers. The resultant set of KKT conditions are obtained as 

2 9 xn 



/ 

+ ^^ {pIpX + pIpX + <y>lH) - = o 



N 



{pIpX +PIA + ^'Ak) - Tr 



.n=l 



At 

Based on (1481 ) and (|49l ). we have 
/ 



P'k 



0, = 0, Vn, ^ > 0, /3" > 0, Vn 



'^l>^Bg,Ph^''gP% 



v 



i=l 



= Pa./?" = 0. 
To satisfy (l50l) . we discuss the following cases: 



> 



+ 



, we must have = 0. 



(48) 



(49) 



(50) 



Else, /X {pi p\ +vIPa + ^l^Bh) < + combining the condition /?" > 0, dSO]) 

can only be fulfilled with p\ > 0, This implies the equation (l37l ) given earlier. Since (l37l ) is a monotonical 
function of within (0,+oo), we choose the only positive root of (|371 ) as . By combining two 
cases, we derive the conclusion in (l36l) . 
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Fig. 1. Illustration of the MIMO two-way relay system. 
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Fig. 3. Performance comparison of iterative algorithm with different initialization points at N = Ad — 2. 
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'k — Random 5 
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Fig. 4. Performance comparison of iterative algorithm with different initialization points at N = Ad — 4. 
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Nonprecoding scheme 

-0 — Proposed Iterative preceding 

— Proposed CP-precoding 
-A — Proposed uniform CP-precoding 
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Fig. 5. The MSE performance comparison with pi = p2 = pr at N = M — 2. 
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Fig. 6. The BER performance comparison with pi — p2 — pr N = M = 2. 
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ir - Nonprecoding scheme 
-e — Proposed Iterative preceding 
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Fig. 7. The BER performance comparison for different relay antenna number. 
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Fig. 8. The BER performance comparison witii |14|. 




Fig. 9. The BER performance comparison with p\ = p2 = pr &t N = M = 2 for single-data- stream transimssion. 



